Deep learning synthesis of cone-beam computed tomography from zero echo time magnetic resonance imaging

Cone-beam computed tomography (CBCT) produces high-resolution of hard tissue even in small voxel size, but the process is associated with radiation exposure and poor soft tissue imaging. Thus, we synthesized a CBCT image from the magnetic resonance imaging (MRI), using deep learning and to assess its clinical accuracy. We collected patients who underwent both CBCT and MRI simultaneously in our institution (Seoul). MRI data were registered with CBCT data, and both data were prepared into 512 slices of axial, sagittal, and coronal sections. A deep learning-based synthesis model was trained and the output data were evaluated by comparing the original and synthetic CBCT (syCBCT). According to expert evaluation, syCBCT images showed better performance in terms of artifacts and noise criteria but had poor resolution compared to the original CBCT images. In syCBCT, hard tissue showed better clarity with significantly different MAE and SSIM. This study result would be a basis for replacing CBCT with non-radiation imaging that would be helpful for patients planning to undergo both MRI and CBCT.

Data preparation. Paired image data were registered due to differences in patient orientation during image acquisition. The entire registration process was conducted via ITK-snap (ver. 3.0, www. itksn ap. org). The gross orientation of MRI (anterior-posterior position) was matched with CBCT orientation (superior-inferior position) manually. Then, based on the mutual information, geometrical rigid registration was conducted until the mutual information between the two images reached its maximum 16 .
Then, the MRI image was resliced into the same thickness, 300 µ m, as the CBCT image. Five hundred and twelve CBCT and MRI axial slides were prepared. For data augmentation, axial image data were reconstructed into 512 coronal and sagittal slides each, and all images were prepared in the BMP format. In total, 64,512 images (3,072 images per data pair) of CBCT and MRI data were prepared.
Deep learning network training. A modified U-Net structure was used for our synthesis model. U-Net is commonly applied to biomedical imaging tasks, as it shows relatively higher accuracy than existing networks with a small number of source images 17 . To enhance the result performance by extracting more hierarchical features than those of the original U-Net, we modified several parts of the network structure as illustrated in Fig. 1. First, the encoder structure was substituted with the Bottleneck blocks of ResNet-50 18 , and all 2-dimensional convolution layers were changed into 3-dimensional convolution layers. Second, the last skip connection of the U-Net was removed because the minute registration error between MRI and CBCT makes the morphology of synthesized prediction confusing, and the different patterns of the input MRI can affect the results. Lastly, to prevent the model capacity from exceeding our hardware memory size, the number of convolution kernels was changed, as described in Fig. 1a. The ablation studies for each proposed component were performed.
Sixteen sets of MRI-CBCT pairs were used for training the synthesis model, and five sets were evaluated as test sets. For pre-processing, we multiplied the MRI-CBCT pair by a circle binary mask with a radius of 256 pixels to remove the background noise (Fig. 1b). Then, the masked images were stacked in the vertical direction to reconstruct a 3-D image of size 512 × 512 × 512. Due to different field of view size, peripheral area loss occurs in specific images of MRI and CBCT sequences. The noisy sequences were excluded in the training step to ensure stable network training. We used only 21-490, 1-360, and 41-380 sequences for the x, y, and z axes of the entire image, respectively. To overcome the limitation of the hardware (memory size) and execute the data augmentation, we randomly extracted patches from the whole image. The experiments were conducted with two different sizes of patches; a large patch of size 128 × 128 × 16 and a small patch of size 64 × 64 × 16, as illustrated in Fig. 1c.
The network was trained by Adam optimizer with an initial learning rate of 2.5 × 10 −4 , that was exponentially decayed by 0.8 every 200 iterations, and the weight decay was 10 -5 . The smooth L1 loss and the early stopping method were used with a stopping factor of 5. The mini-batch sizes were 32 and 8 for the small and large patches, respectively. The input patches were normalized to [− 1, 1].
In the inference phase, an input MRI image was partitioned into the patches using a sliding window method, with the step size being half the patch size (Fig. 1d). The trained synthesis model predicted the CBCT patches. Then, each patch was weighted by the Gaussian filter to generate a smooth cross-section of 3-D synthetic CBCT (syCBCT). Finally, the syCBCT was merged by overlaying weighted patches with the same stride of the sliding window. The two models were superimposed for measurement using Geomagic Control X (3D Systems, Cary, NC, USA). Then, the overall surface deviation was acquired (Fig. 2a) for both large and small patches based on syCBCT. The surface deviation of syCBCT was also obtained for anatomical regions, maxilla, and mandible in the axial and anterior-posterior coronal planes (Fig. 2b). The reference planes were determined by following a previous study 19 . The axial plane was determined by the cement o-enamel junction of the upper and lower teeth. Anatomical landmarks, including mental foramen (anterior) and mandibular foramen (posterior), were used to determine the coronal plane. All measured deviation values were obtained in root mean square (RMS, mm).
Expert image quality evaluation. Two radiologists with more than 10 years of experience conducted a subjective evaluation using the modified version of the clinical image evaluation chart of CBCT provided by the Korean Academy of Oral and Maxillofacial Radiology ( Table 1). The clinical image evaluation chart comprises 4 sections: artifact, noise, resolution, and overall image. In the artifact, noise, and resolution sections, the evaluator graded image series as poor, moderate, or good. For overall grade, the possible outcomes were: no diagnostic value, poor, moderate, or good. www.nature.com/scientificreports/ Image quality evaluation metrics. For five sets of test data, the image quality of the syCBCT in axial series was compared to that of the original CBCT image using three indices, mean absolute error (MAE), peak signal-tonoise ratio (PSNR), and structural similarity indexing method (SSIM), that are frequently used to evaluate synthetic images 20 . MAE suggests a correlation with the image noise level, PSNR is closely related to the clarity and resolution of the image, and SSIM is comprehensively correlated with the structural similarity of the synthetic image. The definition and ideal reference value 18 of each index were as follows:  www.nature.com/scientificreports/ All metrics were obtained according to the ability to present hard tissue, soft tissue, and air in syCBCT compared to the original CBCT 20 .

Statistical analysis and comparisons.
To measure the surface deviation of large and small patch-based 3-D models, RMS values were compared using the Mann-Whitney test. The deviation at the anatomical regions (maxilla, mandible, posterior, and anterior) was compared using the Kruskal-Wallis test and Dunn's multiple comparison post-hoc test. The number of grades from the clinical CBCT image evaluation chart according to each criterion (artifact, noise, resolution, and overall) was also assessed for original CBCT and syCBCT images. Inter-observer agreement was obtained by interclass correlation coefficient (ICC). The image quality metrics, MAE, PSNR, and SSIM, were compared for hard and soft tissue as well as air in individual syCBCT using oneway ANOVA. Statistical analysis was conducted with GraphPad Prism version 9.4.1 (GraphPad Software, La Jolla, CA, USA, www. graph pad. com) and a confidential interval of 95%.

Results
The mean surface deviation was 2.95 ±0.35 and 2.93 ±0.39 mm for large and small patch-based syCBCT, respectively, and there was no statistical difference. Four small patch-based 3D models showed less surface deviation than large patch-based models, while one small patch-based 3D model (syCBCT2) showed more surface deviation than large patch-based models (Table 2, Fig. 3). In deviation measured at different anatomical regions, the anterior region showed larger deviation (large patch, 3.76 mm; small patch, 4.01 mm), and the maxilla showed smaller deviation (large patch, 3.09 mm; small patch, 2.81 mm) ( Table 2). The mean surface deviation between the maxilla and anterior region in small patch-based models was significantly different.
Expert image quality evaluation showed that syCBCT provided better performance in terms of artifact and noise criteria than the original CBCT. On the contrary, the original CBCT obtained a 'good' grade for the resolution criterion (Figs. 4, 5). All original CBCTs showed a 'good' grade for the overall image, while only one syCBCT based on small patch models showed a 'good' grade (Fig. 4d). The ICC between the evaluators was 0.85.
The proposed network introduced structural changes based on U-Net and applied a Gaussian filter at postprocessing. The ablation studies for each proposed component were performed for the small patch, and the corresponding results are listed in Supplementary Table S1. Among the image quality metrics, MAE and SSIM showed significantly better performance in evaluating hard tissue structures (Table 3). However, PSNR showed the best performance in describing air. All three types of tissues showed significantly different level of image quality according to all indices. Additionally, all indices (except SSIM) showed better performance in small patch-based-syCBCT than in the large patch-based image for hard tissue.

Discussion
This study was the first approach to synthesizing dental CBCT images based on ZTEMRI images using deep learning. It is considered an important attempt at this point intime when the need for radiation-free and lowdose dental imaging is increasing. As a result of this study, syCBCT images comparable to CBCT images used at Table 2. Mean value of surface deviation in overall three-dimensional models and the respective anatomical region. *Mann-Whitney test, 95% confidential interval. **Kruskal-Wallis test and Dunn's multiple comparisons post-hoc test, 95% confidential interval.  www.nature.com/scientificreports/ present were achieved. The image quality indices, MAE, PSNR, and SSIM, showed acceptable values in the current study compared to the previous medical image synthetic studies. It was significant that the syCBCT image was superior to the original CBCT image in terms of artifacts and noise, though the resolution was insufficient. In addition, 3D model manipulation, which was challenging based on MRI, showed feasibility through this study. It was significant that the syCBCT showed improvement in the artifacts and noise of the image compared to the original CBCT. These unexpected results have not been reported in any previous studies on CT image synthesis based on MRI data, probably because all studies focused on multichannel CT rather than CBCT 8,9,11 . Traditionally, compared to multichannel CT, CBCT is known to produce images with extensive noise and artifacts due to a low radiation dose and cone-shaped beam. Many researchers have tried to reduce scattering noise and artifacts in CBCT since its introduction in dentistry 21,22 . Although due to a different phenomenon, MRI also produces highly noisy images with artifacts. Thus, we did not expect to obtain improved syCBCT from MRI  Clinical imaging evaluation depicted that the resolution of syCBCT was unsatisfactory in this study. The original CBCT showed a good to moderate grade of resolution, while syCBCT showed poor to moderate grade resolution. This was consistent with imaging quality metrics. The value of PSNR, which represents the clarity of the image, was less than that in previous similar studies 23 . Among several suspected reasons, relatively low sharpness of the hard tissue structure in MRI could be considered primarily. Although the voxel size and slice thickness of the original MRI data was within the range of clinically used CBCT unit, the relatively low sharpness of the bone margin was considered to be an insurmountable problem of the imaging modality itself. This part needs to be supplemented with the development of additional advanced image post-processing techniques.
Meanwhile, the image noise and artifacts level showed enhanced quality, showing lower MAE values, compared to those in the previous studies 23 . Also, the value of SSIM in our study, which indicates overall image quality, was comparable to that of the previous studies 23 . Although the clarity of the syCBCT image was low in the current study, the overall image quality was comparable to that of previous studies due to reduced noise and artifacts.
The blurred margins and low sharpness of anatomic structures in synthetic CT images have been an issue in deep-learning-based CT image synthesis 7,14,24 , and a similar tendency was shown in our study. Leynes et al. 24 mentioned that gross bone depiction in syCBCT was comparable to that in the original CT image, whereas it was difficult to depict finer bone structures. Han 7 also reported that the error in syCBCT mainly occurred at the border of bone tissue. Yuan et al. 14 studied the production of synthetic CT from fast-scan CBCT based on deep-learning models and stated that small fine details were not preserved in synthetic CT images. The overall resolution of synthetic CT was poorer than that of the original CBCT image. To overcome such a problem, Chen et al. 15 pre-processed multichannel CT using the up-sampling method. Through this pre-processing, multichannel CT images were turned into images with higher resolution. Accordingly, the synthetic image output was expected to show improved sharpness and clarity. It is mentioned that, despite their efforts, deformation still tends to appear in the output image 15 .
In the training step, we adopted two different-sized patches as input. In the case of small patches, we expected more precise results with less distortion than those in large patches, enabling us to concentrate more on the delicate morphology of the small region. As a result, improved performances were obtained in the image quality metrics, surface deviation, and expert image quality evaluation. However, the statistical differences were not significant. Thus, advanced research about image pre-processing that enhances the sharpness of input images is needed. In addition, we suggest that excluding patches that contain registration errors due to postural differences in the training step will help to improve the quality of syCBCT. Further, one of the issues with comparing surface deviation in the 3D maxillofacial model, was that the model file contains errors due to the conversion of the file type from the original image format. Therefore, the few millimeters deviations should be considered as due to comparing the relative error according to the input data types and different facial regions, and so it is difficult to view as an absolute error.
Chen et al. 15 mentioned misregistration of the image sets as a possible reason for the synthetic image deformation. The current study included the registration between MRI and CBCT. In particular, the MRI images used in this study could not be completely registered with CBCT images owing to differences in the patient posture during both imaging procedures. Additionally, the MRI used in this study was for TMJ evaluation, and the image signal of the lower submental area, which was relatively far below the TMJ, was not satisfactorily sensitive for accurate model training. Hence, a prospective study design should be established to develop deep-learning models that can synthesize more accurate CBCT images.
Here, a modified U-Net structure with a backbone of ResNet was used. Gholamiankhah et al. and Bahrami et al. compared GAN, eCNN, U-Net, and V-net with ResNet and concluded that ResNet showed the best performance in CT synthesis from MRI 7,12 . We also adapted the ResNet, to take advantage of the feature extraction capability, and removed the last skip connection in U-Net to reduce the disturbance of inevitable registration errors in our dataset. We confirmed that each component of the proposed method improved the quality of syCBCT by conducting the ablation studies (see Supplementary Table S1). Although, all indices did not show best performance, the SSIM, which is known as close to the human visual perception, showed highest values in Table 3. Image quality evaluation metrics based on hard tissue, soft tissue, and air evaluation of the image. *One-way ANOVA with 95% confidential interval. www.nature.com/scientificreports/ the proposed model of the current study. MAE and PSNR of hard tissue was degraded quality in the proposed model compared to the previous studies 8,17 , however, the difference was minute that cannot be detected by naked eye of human (Supplementary Figure S1). Additionally, comparing our proposed method with the existing methods of U-Net 17 and Han et al. 8 , the proposed method generally showed superior performance in image quality indices (see Supplementary Table S2). Han et al. 8 avoided using 3-dimensional convolution filters by warring about of GPU memory limit. We changed the number of kernels in each convolution layer to handle this issue. This modification increased the efficiency of the network capacity by reducing the number of model parameters from 31 million (U-Net) to 10 million (ours).
Previous studies utilized the adversarial learning strategy 7,25 , which trains a synthesis model with a discriminator that tries to distinguish target images as either real or synthesized. However, adversarial learning is known to be challenging to optimize due to "mode collapse," in which a synthesis model keeps generating identical samples 26 . To prevent mode collapse, we used smooth L1 loss instead of adversarial loss. The smooth L1 loss computes pixel-wise differences between original and synthesized images and is relatively robust to recognize outliers rather than mean squared error loss. In our experiment, the artifact of CBCT is considered the outlier, which shows a larger value than other areas. Therefore, it was thought that the utilization of the smooth L1 loss results in reducing syCBCT artifacts would be effective in this study.
There are several limitations to this study. First, although the sample size used in this study was comparable to that of the previous studies, the more enhanced performance of the model can be achieved with more samples due to the nature of deep learning research. Further research with additional MRI and CBCT data sets would help to increase the accuracy of the synthetic image. Additionally, as mentioned above, due to the difference in the patient's position in MRI and CBCT, perfect registration could not be achieved, leading to errors in CBCT image output. In this study, the registration process was conducted using commercial software, while a more sophisticated approach to the registration procedure is required. Lastly, obtaining MRI source data with high image quality, especially in the mandible area, would show a more improved result than that of the current study. Thus, a solid prospective study design would be required to develop more advanced CBCT synthetic models.

Conclusion
This study provided the first approach to CBCT synthesis from ZTE MRI, a non-ionizing radiation imaging. Compared to the conventional CBCT image, the generated CBCT image showed a clinically applicable level in dentistry with improved image quality in terms of noise and artifact. The study results would be expected to provide a basis for non-ionizing radiation imaging with improved quality for replacing CBCT for patients planning to undergo both MRI and CBCT simultaneously.

Data availability
The data generated and analyzed during the current study are not publicly available due to privacy laws and policies in Korea, but are available from the corresponding author on reasonable request.